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THE DISTRIBUTION OF TEST SCORES* 


MERRILL ROFF 
Indiana University 


It is well known that the form of the distribution of scores on a 
test of mental ability varies with the difficulty of the constituent test 
items for the population tested. Thus, a test which is “too easy” for 
a population will give a distribution which is skewed toward the 
lower end, and a test which is “too hard” for a group will give 
a distribution skewed toward the upper end. A symmetrical, bell- 
shaped, or, as a special case, a normal distribution, can thus be 
obtained only when a test is appropriate in difficulty to the group 
tested. This set of facts raises the question of the extent to which 
any distribution of test scores is a result of the measuring instrument, 
the specific set of test items used; that is, will the use of an appropriate 
set of test items yield a particular distribution of scores more or less 
without regard for an assumed distribution of “real ability” in the 
population tested? 

One form in which this problem appears is described by Thorndike 
(2, p. 251) with relation to selected populations, as follows: “What- 
ever be the form of distribution of n individuals in respect of any 
trait, the selection of N individuals from the n on the basis of any- 
thing related to the amount of that trait in any other way than by 
chance will cause the form of the distribution of the N to differ from 
that of the n. Consequently, whatever be the form of the distribu- 
tion of a trait in the general population, its form in bad men, 
educated men, entrepeneurs, slaves, political leaders, criminals, 
lawyers, or any selected group is likely to be different, and is sure 
to be so if the selection bears any relation directly or indirectly to the 
amount of the trait in question. Such selective action is likely to cause 
asymmetry and ever rather abrupt truncation.” 

However, earlier Thorndike (1) found that under conditions in 
which distributions of test scores would give a close fit to a normal 
curve, there was no significant difference in the closeness to normality 
of distribution between populations of sixth grade, ninth grade, 
twelfth grade and college freshman students, although with increasing 
school level there is a progressive selection on the basis of ability; and 
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it is generally found that test scores from college populations are 
symmetrical enough. This suggests that the effect of selection might 
not show up at all in the distribution of scores from a selected 
population if an appropriate set of test items were used. 

More generally, if a distribution of scores should be normal or 
bell-shaped what inference is justified concerning the form of any 
distribution of “true ability” underlying performance on the test? 
Could a more or less normal distribution of scores occur only as a 
result of a more or less normal distribution of “true ability’? Or 
could a symmetrical, bell-shaped score distribution occur with an 
appropriate set of test items if the assumed “true ability” were skew, 
rectangular, or even J-shaped in distribution? It is to obtain informa- 
tion on these questions that the work described here was done. 

The test used was the opposites test of the 1934 A.C.E. psycho- 
logical examination, which was taken by 867 entering male students. 
The test contained 27 items; the distribution of total scores is given in 
Table I. The difficulty of the individual test items ranged from 


TABLE I 


DISTRIBUTION OF SCORES OF 867 MEN STUDENTS ON 
OPPOSITES TEST: 


Score Frequency Score Frequency 
33 
3 3 16 64 
+ 9 17 46 
11 18 67 
6 13 19 58 
7 17 20 38 
8 29 21 40 
9 48 ZZ 29 

10 45 23 28 
11 46 24 15 
12 47 25 22 
13 54 26 15 
14 62 27 5 


88 per cent to 12 per cent right with a median item difficulty of 
59 per cent. From this population of 867, the 114 highest cases 
were selected; this included all persons with scores of 22 right or 
better. The distribution of scores of these persons on the total test 
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is shown on the left side of Figure 1; it is a J-shaped distribution 
obtained by truncating the total distribution at the upper end. For 
these 114 persons, scores were compiled on a subtest composed of 
the five most difhcult items from the original twenty-seven. Item 
difficulties for these five items for the total population and for this 
selected population are given in Table II. The median difficulty for 


TABLE I 


ITEM DIFFICULTIES OF SUB-TEST FOR TOTAL POPULA- 
TION AND FOR SELECTED GROUP 


Per cent answering correctly 


Item Total population 114 highest persons 
21 31 66 

24 18 59 

25 12 48 

26 33 78 

27 12 52 


the selected group is 59 per cent. The distribution of scores on this 
sub-test is given in Table IIT and shown on the right side of Figure 1. 


TABLE III 
SCORES OF 114 HIGHEST PERSONS ON SUB-TEST 
Score Frequency 
0 0 
1 9 
2 32 
3 31 
29 
5 13 


These scores show no marked skewness, and there is nothing about 
the distribution to indicate that the scores of these individuals on 
the complete test was J-shaped. 

A similar procedure was followed for a population from the 
low end of the scale on the total test—the 133 cases with scores of 
9 or less. Scores were calculated for them on a subtest of five easy 
items, which had item difficulties for the total population and for 
the 133 low cases as given in Table IV. The distribution of scores 
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TABLE IV 


ITEM DIFFICULTIES OF SUB-TEST FOR TOTAL 
POPULATION AND FOR LOW GROUP 


Per cent answering correctly 


Item Total population 133 lowest persons 
3 77 30 
+ 85 46 
87 58 
6 86 63 
11 82 53 


for these 133 persons on the sub-test are given in Table V, and the 
two distributions are shown graphically in Figure 2. Here again the 


TABLE V 
SCORES OF 133 LOWEST PERSONS ON SUB-TEST 
Score Frequency 
0 14 
1 17 
2 31 
3 35 
4 29 
5 7 


distribution of the sub-test scores is very much closer to a bell-shaped 
symmetrical curve than to the very markedly J-shaped curve of 
scores on the total test. 


Since so large a proportion of cases in the lower group had scores 
of 9, and since the item difficulties would shift somewhat if those 
persons were eliminated, the above procedure was repeated for the 
85 persons with scores of 8 or less, on the same easy sub-test. The 
item difficulties for this group were 24, 39, 45, 58, and 46. The 
resulting frequencies are 13, 15, 24, 19, 12, 2 for scores of 0 to §, 
respectively. Here there is no indication that the distribution of total 
scores was so definitely J-shaped. 


No formal tests of significance of the distributions are applied here, 


since the groups of five items are short and the strict normality of 
distributions is of less interest than their more or less symmetrical 
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character. Further work is in progress on a test with a larger number 
of items which will permit the construction of longer sub-tests; it is 
hoped this work will provide the basis for more specific statements 
about the relation between item characteristics and type of distribution. 

From the theoretical standpoint the above results are not un- 
expected, for the situation we have with scores on a set of n test 
items is essentially the same as that of the distribution of means from 
a parent population in sampling theory. A variety of studies have 
shown that populations which are skew, rectangular, or J-shaped will 
have normal or near-normal distributions of means with samples as 
small as ten cases, and inevitably give a normal distribution of cases 
when the samples are large. 

The factor that prevents complete identity between test scores and 
means of samples and thus lends point to the empirical study made 
here is the dependence between items of a test. If all the items 
were independent we would expect test scores to approach symmetry 
or normality even if the items were relatively easy or difficult. 
Dependence is primarily responsible for such skewness as is not due 
to chance. The results given above lend support to the belief that 
the amount of dependence between test items is not such as to 
interfere with the production of a more or less symmetrical distribu- 
tion of total scores where the constituent items are appropriate and 
somewhat similar in difficulty, regardless, within rather wide limits, 
of the type of assumed distribution of underlying ability. For 
example, a rectangular distribution of underlying ability would be 
less extreme than the J-shaped forms described above, and conse- 
quently could yield a symmetrical, bell-shaped distribution of scores, 
providing the set of items was appropriate in difficulty. 

All this is not to argue that “true abilities” are distributed in any 
particular way, but rather to indicate the difficulty of making 
inferences about distributions of “true ability” from sets of scores 
from psychological tests, and, beyond that, to suggest the d'fficulties 
inherent in talking about “true abilities” at all except as they be 
defined with reference to some specific observational or test procedure. 
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